After the Integration and dimension reduction in the previous Worksheet, the integrated samples were clustered with SLM algotithm (an improved Louvain algorithm). The cluster resolution was selected after evaluating clustering and annotation of several runs and compare the results with the original publication. Annotation was performed manually based on a given Marker set and evaluation of highly expressed markers.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(Seurat)
library(ggplot2)
packageVersion("dplyr")
[1] '0.8.99.9003'
packageVersion("Seurat")
[1] '3.1.5'
###1.2 Load Data from previous worksheet
bmmc_all <- readRDS(file ="./StoredRObj/bmmc_PreProc_Ref2.rds")
DefaultAssay(object = bmmc_all) <- "integrated"
Seurat’s cluster algorithm are based on maximizing the modularity to detect communitys. In the first step the SNN graph is construsted
bmmc_all <- FindNeighbors(bmmc_all, reduction = "pca", dims = 1:20)
Computing nearest neighbor graph
Computing SNN
Then the Clusters are calculated based on the given SNN and a function to increase modularity. The cluster function was optimized to gain as much annotable clusters as possible while keeping the amount of unannotated low (resolution = 0.3). The SLM algorihtm was selected.
bmmc_all <- FindClusters(bmmc_all, algorithm = 3, resolution = 0.3, random.seed = 19950927)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 76645
Number of edges: 3055173
Running smart local moving algorithm...
Maximum modularity in 10 random starts: 0.9611
Number of communities: 30
Elapsed time: 116 seconds
8 singletons identified. 22 final clusters.
DimPlot(bmmc_all, reduction = "umap", label = TRUE, pt.size = 0.2, label.size = 10) + NoLegend()
Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
Please use `as_label()` or `as_name()` instead.
This warning is displayed once per session.
A given marker set of in total 55 markers was used to evaluate and annotate the clusers. All markers: (“AIGLC3”,“AVP”,“CCL5”,“CCR7”,“CD14”,“CD19”,“CD33”,“CD34”,“CD38”,“CD3D”,“CD3E”,“CD3G”,“CD4”,“CD74”,“CD79”,“CD79A”,“CD79B”,“CD8A”,“CD8B”,“CSF3R”,“CST3”,“DC74”,“DNTT”,“ELANE”,“FCER1A”,“FCGR3A”,“GATA1”,“GNLY”,“GYPA”,“HBA1”,“HBB”,“HBD”,“IGLC3”,“IL2RA”,“IL3RA”,“IL7R”,“JCHAIN”,“LYZ”,“MPO”,“MS4A1”,“MS4A7”,“MZB1”,“NCAM1”,“NKG7”,“PAX5”,“PF4”,“PLD4”,“PPBP”,“REXO2”,“RHAG”,“S100A9”,“SLC4A1”,“SOX4”,“SPI1”,“TCL1A”).
The following picture shows the violin and dim Plot graphs for HBA1 (representing Eukaryotes), CD3D (representing T-Cells), CD79A (B-cells), AVP (HSPC), CD34 (precursor and HSPC), MS4A7 (Monocytes), GNLY (TK-cells), FCER1A (DC) and PPBP (Megakaryocytes - only in DimPlot).
VlnPlot(bmmc_all, features = c( "AVP","HBA1","CD3D","CD79A", "CD34", "MS4A7", "GNLY", "FCER1A"), assay = "integrated",group.by = "seurat_clusters", ncol = 2, pt.size = 0.1) + NoLegend()
Warning: Could not find CD3D in the default search locations, found in RNA assay
instead